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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The present document describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform 
PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed 
speech samples within the digital cellular telecommunications system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 
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Scope 



The present document describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform 
PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed 
speech samples. The sampling rate is 8 000 sample/s leading to a bit rate for the encoded bit stream of 12,2 kbit/s. The 
coding scheme is the so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP. 

The present document also specifies the conversion between A-law or )i-law (PCS 1900) PCM and 13-bit uniform 
PCM. Performance requirements for the audio input and output parts are included only to the extent that they affect the 
transcoder performance. This part also describes the codec down to the bit level, thus enabling the verification of 
compliance to the part to a high degree of confidence by use of a set of digital test sequences. These test sequences are 
described in GSM 06.54 [7] and are available on disks. 

In case of discrepancy between the requirements described in the present document and the fixed point computational 
description (ANSI-C code) of these requirements contained in GSM 06.53 [6], the description in GSM 06.53 [6] will 
prevail. 

The transcoding procedure specified in the present document is applicable for the enhanced full rate speech traffic 
channel (TCH) in the GSM system. 

In GSM 06.51 [5], a reference configuration for the speech transmission chain of the GSM enhanced full rate (EFR) 
system is shown. According to this reference configuration, the speech encoder takes its input as a 13-bit uniform PCM 
signal either from the audio part of the Mobile Station or on the network side, from the PSTN via an 8-bit/A-law or \i- 
law (PCS 1900) to 13-bit uniform PCM conversion. The encoded speech at the output of the speech encoder is 
delivered to a channel encoder unit which is specified in GSM 05.03 [3]. In the receive direction, the inverse operations 
take place. 



References 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] GSM 01.04: "Digital cellular telecommunications system (Phase 2+); Abbreviations and 

acronyms". 

[2] GSM 03.50: "Digital cellular telecommunications system (Phase 2+); Transmission planning 

aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system". 

[3] GSM 05.03: "Digital cellular telecommunications system (Phase 2+); Channel coding". 

[4] GSM 06.32: "Digital cellular telecommunications system (Phase 2+); Voice Activity Detection 

(VAD)". 

[5] GSM 06.51: "Digital cellular telecommunications system (Phase 2+); Enhanced Full Rate (EFR) 

speech processing functions General description". 

[6] GSM 06.53: "Digital cellular telecommunications system (Phase 2+); ANSI-C code for the GSM 

Enhanced Full Rate (EFR) speech codec". 

[7] GSM 06.54: "Digital cellular telecommunications system (Phase 2+); Test vectors for the GSM 

Enhanced Full Rate (EFR) speech codec". 
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[8] ITU-T Recommendation G.71 1 (1988): "Coding of analogue signals by pulse code modulation 

Pulse code modulation (PCM) of voice frequencies". 

[9] ITU-T Recommendation G.726: "40, 32, 24, 16 kbit/s adaptive differential pulse code modulation 

(ADPCM)". 



3 Definitions, symbols and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

adaptive codebook: adaptive codebook contains excitation vectors that are adapted for every subframe. The adaptive 
codebook is derived from the long term filter state. The lag value can be viewed as an index into the adaptive codebook. 

adaptive postfilter: this filter is applied to the output of the short term synthesis filter to enhance the perceptual quality 
of the reconstructed speech. In the GSM enhanced full rate codec, the adaptive postfilter is a cascade of two filters: a 
formant postfilter and a tilt compensation filter. 

algebraic codebook: fixed codebook where algebraic code is used to populate the excitation vectors (innovation 
vectors). The excitation contains a small number of nonzero pulses with predefined interlaced sets of positions. 

closed-loop pitch analysis: this is the adaptive codebook search, i.e., a process of estimating the pitch (lag) value from 
the weighted input speech and the long term filter state. In the closed-loop search, the lag is searched using error 
minimization loop (analysis-by-synthesis). In the GSM enhanced full rate codec, closed-loop pitch search is performed 
for every subframe. 

direct form coefficients: one of the formats for storing the short term filter parameters. In the GSM enhanced full rate 
codec, all filters which are used to modify speech samples use direct form coefficients. 

fixed codebook: fixed codebook contains excitation vectors for speech synthesis filters. The contents of the codebook 
are non-adaptive (i.e., fixed). In the GSM enhanced full rate codec, the fixed codebook is implemented using an 
algebraic codebook. 

fractional lags: set of lag values having sub-sample resolution. In the GSM enhanced full rate codec a sub-sample 
resolution of l/6th of a sample is used. 

frame: time interval equal to 20 ms (160 samples at an 8 kHz sampling rate). 

integer lags: set of lag values having whole sample resolution. 

interpolating filter: FIR filter used to produce an estimate of sub-sample resolution samples, given an input sampled 
with integer sample resolution. 

inverse filter: this filter removes the short term correlation from the speech signal. The filter models an inverse 
frequency response of the vocal tract. 

lag: long term filter delay. This is typically the true pitch period, or a multiple or sub-multiple of it. 

Line Spectral Frequencies: (see Line Spectral Pair). 

Line Spectral Pair: transformation of LPC parameters. Line Spectral Pairs are obtained by decomposing the inverse 
filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd 
symmetry. The Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of these polynomials on the 
z-unit circle). 
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LP analysis window: for each frame, the short term filter coefficients are computed using the high pass fihered speech 
samples within the analysis window. In the GSM enhanced full rate codec, the length of the analysis window is 240 
samples. For each frame, two asymmetric windows are used to generate two sets of LP coefficients. No samples of the 
future frames are used (no lookahead). 

LP coefficients: Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients) is a 
generic descriptive term for describing the short term filter coefficients. 

open-loop pitch search: process of estimating the near optimal lag directly from the weighted speech input. This is 
done to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags around the 
open-loop estimated lags. In the GSM enhanced full rate codec, open-loop pitch search is performed every 10 ms. 

residual: output signal resulting from an inverse filtering operation. 

short term synthesis filter: this filter introduces, into the excitation signal, short term correlation which models the 
impulse response of the vocal tract. 

perceptual weighting filter: this filter is employed in the analysis-by-synthesis search of the codebooks. The filter 
exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near 
the formant frequencies and more in regions away from them. 

subframe: time interval equal to 5 ms (40 samples at an 8 kHz sampling rate). 

vector quantization: method of grouping several parameters into a vector and quantizing them simultaneously. 

zero input response: output of a filter due to past inputs, i.e. due to the present state of the filter, given that an input of 
zeros is applied. 

zero state response: output of a filter due to the present input, given that no past inputs have been applied, i.e., given 
the state information in the filter is all zeroes. 

3.2 Symbols 

For the purposes of the present document, the following symbols apply: 

A(z) The inverse filter with unquantized coefficients 

A(z) The inverse filter with quantified coefficients 

H[z) = -^ The speech synthesis filter with quantified coefficients 

A{z) 

a, The unquantized linear prediction parameters (direct form coefficients) 

Uj The quantified linear prediction parameters 

^ The order of the LP model 
1 



The long-term synthesis filter 

B(z) 

W(z) The perceptual weighting filter (unquantized coefficients) 

}^j , }^2 The perceptual weighting factors 

F^(z) Adaptive pre-filter 

T The nearest integer pitch lag to the closed-loop fractional pitch lag of the subframe 

P The adaptive pre-filter coefficient (the quantified pitch gain) 

H f(z) = -^ The formant postfilter 

Y„ Control coefficient for the amount of the formant post-filtering 

y J Control coefficient for the amount of the formant post-filtering 

H^ (z) Tilt compensation filter 
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y J Control coefficient for the amount of the tilt compensation filtering 

jil= y fki A tilt factor, with ki being the first reflection coefficient 

hf(n) The truncated impulse response of the formant postfilter 

L/j The length of hf(n) 

rfj(i) The auto-correlations of hf(n) 

A(z / Yn j The inverse filter (numerator) part of the formant postfilter 

1 /A(z /"id) The synthesis filter (denominator) part of the formant postfilter 

r(n) The residual signal of the inverse filter A{z/yyi) 

hf (z) Impulse response of the tilt compensation filter 

/3g^ (n) The AGC -controlled gain scaling factor of the adaptive postfilter 

OC The AGC factor of the adaptive postfilter 

^fti(^) Pre-processing high-pass filter 

Wj(n), Wjj(n) LP analysis windows 

1 Length of the first part of the LP analysis window / *- -' 

2 Length of the second part of the LP analysis window "^Z v"-' 

1 Length of the first part of the LP analysis window H ^ -' 

2 Length of the second part of the LP analysis window H ^ ' 
^aci^^ The auto-correlations of the windowed speech s' (n) 

^;aje(0 Lag window for the auto -correlations (60 Hz bandwidth expansion) 

■^ The bandwidth expansion in Hz 

f^ The sampling frequency in Hz 

r' (k) 

'"^ The modified (bandwidth expanded) auto-correlations 

^LDv) Th^ prediction error in the rth iteration of the Le Vinson algorithm 

k^ The ith reflection coefficient 

a ■ The jth direct form coefficient in the /th iteration of the Le Vinson algorithm 

Fi( z ) Symmetric LSF polynomial 

2(2/ Antisymmetric LSF polynomial 

F^ (z) Polynomial F^ (z) with root z — —i eliminated 

Fi (z) Polynomial ^2 (z) with root z = 1 ehminated 

"' The line spectral pairs (LSPs) in the cosine domain 

q An LSP vector in the cosine domain 

q^ The quantified LSP vector at the rth subframe of the frame n 

i The line spectral frequencies (LSPs) 

T^^ (x) A mth order Chebyshev polynomial 

/l (05/2(0 The coefficients of the polynomials F^(z) and i^(z) 

/l (05/2(0 The coefficients of the polynomials F^ [z) and F2 [Zj 

f (i) The coefficients of either F^( z ) or F2( Z ) 

C( X ) Sum polynomial of the Chebyshev polynomials 

X Cosine of angular frequency CO 
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X]^ Recursion coefficients for the Chebyshev polynomial evaluation 

jfj- The line spectral frequencies (LSFs) in Hz 

f ~ I /l /2 ■ ■ • /l I "^^^ vector representation of the LSFs in Hz 

Z ( n ) ,Z (n) The mean-removed LSF vectors at frame n 

T ( n ) , r ( n ) The LSF prediction residual vectors at frame n 
p(n) The predicted LSF vector at frame n 

r ( n — 1) The quantified second residual vector at the past frame 

f The quantified LSF vector at quantization index k 

^LSP ^^^ ^^^ quantization error 

W-,i — l,... ,10, LSP-quantization weighting factors 

dj The distance between the line spectral frequencies fi_^_^ and /j_j 

h( n ) The impulse response of the weighted synthesis filter 

Oj^ The correlation maximum of open-loop pitch analysis at delay k 

0[ , i—1, ... ,3 The correlation maxima at delays t^,i = 1,... ,3 

(M^ , tf ), Z = 1, . . . ,3 The normalized correlation maxima M^ and the corresponding delays ti,i = 1, . . . ,3 

A(z/y ) 

H(z)W(z) = — The weighted synthesis filter 

A(z)A(z/r2) 

A( z/ji ) The numerator of the perceptual weighting filter 

1/ A( z/y2 ) ^^^ denominator of the perceptual weighting filter 

Ti The nearest integer to the fractional pitch lag of the previous ( 1 st or 3rd) subframe 

s' (n) The windowed speech signal 

S^^(n) The weighted speech signal 

s(n) Reconstructed speech signal 

S (n) The gain-scaled post-filtered signal 

s f( n) Post-filtered speech signal (before scahng) 

x(n) The target signal for adaptive codebook search 

X2( n ) X2 The target signal for algebraic codebook search 

resipin) The LP residual signal 

c(n) The fixed codebook vector 

v(n) The adaptive codebook vector 

y{n) = v(n)* h(n) The filtered adaptive codebook vector 

y^ (n) The past filtered excitation 

u( n ) The excitation signal 

u[n) The emphasized adaptive codebook vector 

u' (n) The gain-scaled emphasized excitation signal 

Tgp The best open-loop lag 

tffiif^ Minimum lag search value 

tfj^^-^ Maximum lag search value 

R{k) Correlation term to be maximized in the adaptive codebook search 

b24 The FIR filter for interpolating the normalized correlation term R{k) 

R{k)f The interpolated value of R{k) for the integer delay k and fraction t 
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b(^Q The FIR filter for interpolating the past excitation signal u( n ) to yield the adaptive codebook 

vector v( n ) 

A^ Correlation term to be maximized in the algebraic codebook search at index k 

Cj^ The correlation in the numerator of A^ at index k 

Ej-)i The energy in the denominator of A^ at index k 

d = H X2 The correlation between the target signal xAtlj and the impulse response /l(«j , i.e., backward 

filtered target 

H The lower triangular Toepliz convolution matrix with diagonal HOj and lower diagonals 

h{l),...,h{39) 

O = H H The matrix of correlations of hirn 

d(n) The elements of the vector d 

(f>{i, i) The elements of the symmetric matrix O 

C 1^ The innovation vector 

C The correlation in the numerator of A^^ 

nil The position of the / th pulse 

l9 1 The amplitude of the / th pulse 

A^ The number of pulses in the fixed codebook excitation 

Ej-, The energy in the denominator of A^ 

res jj-p{n) The normalized long-term prediction residual 

b(n) The sum of the normalized din) vector and normalized long-term prediction residual 

resLTp{n) 

Sfy (n) The sign signal for the algebraic codebook search 

' "^ Sign extended backward filtered target 

(p (i,j) The modified elements of the matrix O , including sign information 

Z , z{n) The fixed codebook vector convolved with h( n ) 

E( n ) The mean-removed innovation energy (in dB) 

E The mean of the innovation energy 

E( n ) The predicted energy 

\b\ Z?2 ^3 ^4 I The MA prediction coefficients 

R( k ) The quantified prediction error at subframe k 

Ej The mean innovation energy 

R{n) The prediction error of the fixed-codebook gain quantization 

Eq The quantization error of the fixed-codebook gain quantization 

e{n) The states of the synthesis filter 1/ A( z) 

e^( n ) The perceptually weighted error of the analysis-by-synthesis search 

T] The gain scaling factor for the emphasized excitation 

g^ The fixed-codebook gain 

g^ The predicted fixed-codebook gain 

g^ The quantified fixed codebook gain 

g The adaptive codebook gain 

g The quantified adaptive codebook gain 
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Tec ~ Sc ^ Sc ^ correction factor between the gain g^ and the estimated one g^ 
Y „^ The optimum value for y „^ 

y ^.^ Gain scaling factor 

3.3 Abbreviations 

For the purposes of the present document, the following abbreviations apply. Further GSM related abbreviations may be 
found in GSM 01.04 [1]. 

ACELP Algebraic Code Excited Linear Prediction 

AGC Adaptive Gain Control 

CELP Code Excited Linear Prediction 

FIR Finite Impulse Response 

ISPP Interleaved Single-Pulse Permutation 

LP Linear Prediction 

LPC Linear Predictive Coding 

LSF Line Spectral Frequency 

LSP Line Spectral Pair 

LTP Long Term Predictor (or Long Term Prediction) 

MA Moving Average 



4 Outline description 

The present document is structured as follows. 

Subclause 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Subclause 4.2 
describes the conversion between 13-bit uniform and 8-bit A-law or \i-law (PCS 1900) samples. Subclauses 4.3 and 4.4 
present a simplified description of the principles of the GSM EFR encoding and decoding process respectively. In 
clause 4.5, the sequence and subjective importance of encoded parameters are given. 

Clause 5 presents the functional description of the GSM EFR encoding, whereas clause 6 describes the decoding 
procedures. Clause 7 describes variables, constants and tables of the C-code of the GSM EFR codec. 

4.1 Functional description of audio parts 

The analogue -to-digital and digital-to-analogue conversion will in principle comprise the following elements: 

1) analogue to uniform digital PCM: 

microphone; 

input level adjustment device; 
input anti-aliasing filter; 
sample-hold device sampling at 8 kHz; 

analogue-to-uniform digital conversion to 13 -bit representation. 
The uniform format shall be represented in two's complement. 

2) uniform digital PCM to analogue: 

conversion from 13-bit/8 kHz uniform PCM to analogue; 

a hold device; 

reconstruction filter including x/sin( x ) correction; 
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output level adjustment device; 

earphone or loudspeaker. 

In the terminal equipment, the A/D function may be achieved either: 

by direct conversion to 13 -bit uniform PCM format; 

or by conversion to 8-bit/A-law or )i-law (PCS 1900) compounded format, based on a standard A-law or ji- 
law (PCS 1900) codec/filter according to ITU-T Recommendations G.71 1 [8] and G.714, followed by the 
8-bit to 13-bit conversion as specified in clause 4.2.1. 

For the D/A operation, the inverse operations take place. 

In the latter case it should be noted that the specifications in ITU-T G.714 (superseded by G.712) are concerned with 
PCM equipment located in the central parts of the network. When used in the terminal equipment, the present document 
does not on its own ensure sufficient out-of-band attenuation. The specification of out-of-band signals is defined in 
GSM 03.50 [2] in clause 2. 



4.2 Preparation of speech samples 



The encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16-bit word. The three 
least significant bits are set to '0'. The decoder outputs data in the same format. Outside the speech codec further 
processing must be applied if the traffic data occurs in a different representation. 

4.2.1 PCM format conversion 

The conversion between 8-bit A-Law or ji-law (PCS 1900) compressed data and linear data with 13-bit resolution at the 
speech encoder input shall be as defined in ITU-T Rec. G.71 1 [8]. 

ITU-T Recommendation G.71 1 [8] specifies the A-Law or )a-law (PCS 1900) to linear conversion and vice versa by 
providing table entries. Examples on how to perform the conversion by fixed-point arithmetic can be found in ITU-T 
Recommendation G.726 [9]. Subclause 4.2.1 of G. 726 [9] describes A-Law and )i-law (PCS 1900) to linear expansion 
and clause 4.2.7 of G.726 [9] provides a solution for linear to A-Law and )a-law (PCS 1900) compression. 

4.3 Principles of the GSM enhanced full rate speech encoder 

The codec is based on the code-excited linear predictive (CELP) coding model. A 10th order linear prediction (LP), or 
short-term, synthesis filter is used which is given by: 



H(z) = ^— = ^^ 7, (1) 



A(z) 



1-1- > a,- 



where dj,i — l,...,ni, are the (quantified) linear prediction (LP) parameters, and m—10 is the predictor order. The 
long-term, or pitch, synthesis filter is given by: 

1 1 

= T' (2) 

B(z) l-g,z-' 

where T is the pitch delay and ^ „ is the pitch gain. The pitch synthesis filter is implemented using the so-called 
adaptive codebook approach. 

The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of the short-term 
LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The 
speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term 
synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search 
procedure in which the error between the original and synthesized speech is minimized according to a perceptually 
weighted distortion measure. 
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The perceptual weighting fiher used in the analysis-by-synthesis search technique is given by: 

WM = ^f^. (3, 

where Aizj is the unquantized LP filter and 0<'Y2^Tl— ^^^ '^he perceptual weighting factors. The values 
Y J = 0.9 and 7 2 ~ ^-^ ^^ used. The weighting filter uses the unquantized LP parameters while the formant 
synthesis filter uses the quantified ones. 

The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8 000 
sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP 
filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At 
the decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal 
through the LP synthesis filter. 

The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame. The two sets of LP 
parameters are converted to line spectrum pairs (LSP) and jointly quantified using split matrix quantization (SMQ) with 
38 bits. The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook 
parameters are transmitted every subframe. The two sets of quantified and unquantized LP filters are used for the 
second and fourth subframes while in the first and third subframes interpolated LP filters are used (both quantified and 
unquantized). An open-loop pitch lag is estimated twice per frame (every 10 ms) based on the perceptually weighted 
speech signal. 

Then the following operations are repeated for each subframe: 

The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter 

W( Z )H( z ) with the initial states of the filters having been updated by filtering the error between LP residual 

and excitation (this is equivalent to the common approach of subtracting the zero input response of the weighted 
synthesis filter from the weighted speech signal). 

The impulse response, h( n ) of the weighted synthesis filter is computed. 

Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target x( n ) and impulse 
response h( n ) ,hy searching around the open-loop pitch lag. Fractional pitch with l/6th of a sample resolution 
is used. The pitch lag is encoded with 9 bits in the first and third subframes and relatively encoded with 6 bits in 
the second and fourth subframes. 

The target signal x(n) is updated by removing the adaptive codebook contribution (filtered adaptive 

code vector), and this new target, .JCjf n j , is used in the fixed algebraic codebook search (to find the optimum 

innovation). An algebraic codebook with 35 bits is used for the innovative excitation. 

The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively (with moving 
average (MA) prediction applied to the fixed codebook gain). 

Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in 
the next subframe. 

The bit allocation of the codec is shown in table 1. In each 20 ms speech frame, 244 bits are produced, corresponding to 
a bit rate of 12.2 kbit/s. More detailed bit allocation is available in table 6. Note that the most significant bits (MSB) are 
always sent first. 
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Table 1 : Bit allocation of the 12.2 kbit/s coding algorithm for 20 ms frame 



Parameter 


1st &3rd subframes 


2nd & 4th subframes 


total per frame 


2 LSP sets 






38 










Pitch delay 


9 


6 


30 


Pitch gain 


4 


4 


16 


Algebraic code 


35 


35 


140 


Codebook gain 


5 


5 


20 


Total 






244 



4.4 Principles of the GSIVI enhanced full rate speech decoder 

The signal flow at the decoder is shown in figure 4. At the decoder, the transmitted indices are extracted from the 
received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These 
parameters are the two LSP vectors, the 4 fractional pitch lags, the 4 innovative codevectors, and the 4 sets of pitch and 
innovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at each 
subframe. Then, at each 40-sample subframe: 

the excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains; 

the speech is reconstructed by filtering the excitation through the LP synthesis filter. 

Finally, the reconstructed speech signal is passed through an adaptive postfilter. 

4.5 Sequence and subjective importance of encoded 
parameters 

The encoder will produce the output information in a unique sequence and format, and the decoder must receive the 
same information in the same way. In table 6, the sequence of output bits si to s244 and the bit allocation for each 
parameter is shown. 

The different parameters of the encoded speech and their individual bits have unequal importance with respect to 
subjective quality. Before being submitted to the channel encoding function the bits have to be rearranged in the 
sequence of importance as given in table 6 in 05.03 [3]. 



5 Functional description of the encoder 

In this clause, the different functions of the encoder represented in figure 3 are described. 



5.1 Pre-processing 



Two pre-processing functions are applied prior to the encoding process: high-pass filtering and signal down-scaling. 

Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixed-point 
implementation. 

The high-pass filter serves as a precaution against undesired low frequency components. A filter with a cut off 
frequency of 80 Hz is used, and it is given by: 



HJz)- 



0.92727435 - 1.854494U-' + 0.92727435z 



-2 



(4) 



1 - 1.9059465Z"' + 0.91 14024^-' 

Down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of //;,[( z) by 2. 
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5.2 Linear prediction analysis and quantization 

Short-term prediction, or linear prediction (LP), analysis is performed twice per speech frame using the auto-correlation 
approach with 30 ms asymmetric windows. No lookahead is used in the auto-correlation computation. 

The auto-correlations of windowed speech are converted to the LP coefficients using the Levinson-Durbin algorithm. 
Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for quantization and interpolation 
purposes. The interpolated quantified and unquantized filter coefficients are converted back to the LP filter coefficients 
(to construct the synthesis and weighting filters at each subframe). 

5.2.1 Windowing and auto-correlation computation 

LP analysis is performed twice per frame using two different asymmetric windows. The first window has its weight 
concentrated at the second subframe and it consists of two halves of Hamming windows with different sizes. The 
window is given by: 



Wi(n) 



f 



0.54 -0.46 COS 



7m 



0.54 + 0.46 cos 









n = 0,...,Li(^)-l, 



(5) 



The values L^ ' = 160 and Z^ = 80 are used. The second window has its weight concentrated at the fourth 

subframe and it consists of two parts: the first part is half a Hamming window and the second part is a quarter of a 
cosine function cycle. The window is given by: 



Wji(n) : 



r 



cos 



0.54 -0.46 COS 



27r(n-L,("^)^ 



2mi 



2Li(^^) - ij 



n = 0,...,Li(^^)-l, 



(6) 



where the values Lj ' = 232 and L2 = 8 are used. 

Note that both LP analyses are performed on the same set of speech samples. The windows are applied to 80 samples 
from past speech frame in addition to the 160 samples of the present speech frame. No samples from future frames are 
used (no lookahead). A diagram of the two LP analysis windows is depicted below. 
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w^^(n) 






frame n-1 

20 ms 


5 ms 


^ 


> 


V 


frame (160 samples) 


^ sub frame 
(40 samples) 



frame n 



Figure 1 : LP analysis windows 

The auto-correlations of the windowed speech s'( n ),n — 0, . . . ,239 , are computed by: 



239 



rAk) = Y,s'(n)s\n-k), fc = 0,...,10, 



n-k 



and a 60 Hz bandwidth expansion is used by lag windowing the auto-correlations using the window: 



wiag(i)=ey-\i 



V fs - 



/ = !,... ,10, 



(7) 



(8) 



where /q = 60 Hz is the bandwidth expansion and j ^ = 8000 Hz is the sampling frequency. Further, r^^ (0) is 
multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at -40 dB. 

5.2.2 Levinson-Durbin algorithm 

The modified auto-correlations r' ^^ (0) = 1.0001 r^^CO) and r' ^^ ik) = r^c(k)w,^g(k), k = l,.. .10, are used to 
obtain the direct form LP filter coefficients a^ , A: = 1, ... ,10, by solving the set of equations. 



Za,r\^{\i-k\) = -r\Ai), / = 1,...,10. 



(9) 



k=l 



The set of equations in (9) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion: 
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^LZ)(0) = 

for i = 1 


rac (0) 
to 10 do 






«('-i) - 1 




k,=- 


T;>r\^'('-f> 


/^ldO'- 


-1) 


a?'=k 




for J = 1 to z - 1 do 




end 




end 




Tu<» f;„„i o^i 


„.,•„„,•„ „4,™„„ ^ - ^noj •_i 


in 



The LP filter coefi^icients are converted to the line spectral pair (LSP) representation for quantization and interpolation 
purposes. The conversions to the LSP domain and back to the LP filter coefficient domain are described in the next 
clause. 

5.2.3 LP to LSP conversion 

The LP filter coefficients a^^,, A: = 1, . . .,10 , are converted to the line spectral pair (LSP) representation for quantization 
and interpolation purposes. For a 10th order LP filter, the LSPs are defined as the roots of the sum and difference 
polynomials: 

Fi(z)=A(z)+z~^^A(z~^) (10) 

and 

F2(z)=A(z)-z-^^A(z-'^), (11) 

respectively. The polynomial F\( z) and F2( Z ) are symmetric and anti-symmetric, respectively. It can be proven 

that all roots of these polynomials are on the unit circle and they alternate each other. Fi( z ) has a root 

Z = —I ( (H = TZ ) and F2( Z ) has a root Z = l fCO = Oj.To eliminate these two roots, we define the new 
polynomials: 

Fi(z) = Fi(z)/(l + z-'^) (12) 

and 

F2(z) = F2(z)/(l-z-^). (13) 

Each polynomial has 5 conjugate roots on the unit circle ( e""'"' 1 , therefore, the polynomials can be written as 

Fi(z)= Yl[l-2qiZ-^+z-^) (14) 

i=l,3,...,9 



and 



F2(Z)= Yl[l-2qiZ-^+z-^), (15) 

!=2,4,...,10 
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where q- — COS (cO, j with &>, being the line spectral frequencies (LSF) and they satisfy the ordering property 
< COj < CO2 <■ • •< (Ojg < 71 . We refer to Qj as the LSPs in the cosine domain. 

Since both polynomials F^ (z) and i^j (z) are symmetric only the first 5 coefficients of each polynomial need to be 
computed. The coefficients of these polynomials are found by the recursive relations (for / = to 4); 

fl(i + l) = ai+i+a^_i-fi(i), 

(16) 

f2(i + l) = «/+i - a^n-i + f2(i), 

where m—10 is the predictor order. 

The LSPs are found by evaluating the polynomials F^( z ) and F2( Z ) at 60 points equally spaced between and ;r 
and checking for sign changes. A sign change signifies the existence of a root and the sign change interval is then 
divided 4 times to better track the root. The Chebyshev polynomials are used to evaluate F^( z ) and i^jf Z j . In this 

method the roots are found directly in the cosine domain w,} • The polynomials F^( z ) or F^i z) evaluated at 

Z = c'"' can be written as: 

F((ii) = 2e-'"^C(x), 
with: 

C(x)^T,(x) + f(\)T,(x)+ f(2)T,(x)+ f(3)T,(x) + f(A)T,(x) + f(5)/2, (17) 

where T^( X ) — COs( niO) ) is the mth order Chebyshev polynomial, and f(i), Z = 1, . . . ,5, are the coefficients of 
either F^( z ) or i^jf z j , computed using the equations in (16). The polynomial C( X ) is evaluated at a certain value 
of .x: = COs( CO ) using the recursive relation: 

for ^ = 4 down to 1 

^k = ^^^k+i - ^k+2 + /(5 - k) 
end 
C(x) = xA^ -A2 + /(5) / 2, 

with initial values A^ = 1 and A(^ = 0. The details of the Chebyshev polynomial evaluation method are found in P. 
Kabal and R.P. Ramachandran [6]. 

5.2.4 LSP to LP conversion 

Once the LSPs are quantified and interpolated, they are converted back to the LP coefficient domain |a^ | . The 
conversion to the LP domain is done as follows. The coefficients of F^( z ) or i^jf Z j are found by expanding 
equations (14) and (15) knowing the quantified and interpolated LSPs q^, i = 1, . . . ,10 . The following recursive 
relation is used to compute /^(i). 

for / = 1 to 5 

f,(i) = -2q,,_J,(i-l) + 2f,(i-2) 
for j = i-l down to 1 

fr(J) = fi(J)-^q2^-Jr(J-V+fJJ-2) 
end 
end 
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with initial values /jfOj = l and /j f — lj = 0. The coefficients /2 (^) are computed similarly by replacing ^2;-l ^Y 

Once the coefl'icients f^(i) and f2(i) are found, F^( z ) and F2( Z ) are multiplied by 1 + Z and 1 — Z , 
respectively, to obtain F^( z ) and .Fj ( Z j ; that is: 

/ifO = /if/j+/if/-lj, / = !,.. .,5, 

(18) 

f2(i) =f2(i)-f2(i-^), i = l,...,5. 

Finally the LP coefficients are found by: 

^ \ Q5fi(i) + Q.5f2(i), i = \,...,5, ^^^^ 

' |0.5/iVll-/j-0.5/2(ll-/j, / = 6,...,10. 

This is directly derived from the relation A( z)= \Fi(z)+F2(z)\/2, and considering the fact that F^( z ) and 
i^2 f Z j are symmetric and anti-symmetric polynomials, respectively. 

5.2.5 Quantization of tine LSP coefficients 

The two sets of LP filter coefficients per frame are quantified using the LSP representation in the frequency domain; 
that is: 

f 
fi^^^arccos{qi), / = !,..., 10, (20) 

where /,• are the line spectral frequencies (LSF) in Hz [0,4000] and /^ = 8000 is the sampling frequency. The LSP 
vector is given by f =1 /i /2 ■ ■ -/lO I ' ^i'-^ ^ denoting transpose. 

A 1st order MA prediction is applied, and the two residual LSF vectors are jointly quantified using split matrix 
quantization (SMQ). The prediction and quantization are performed as follows. Let Z ( n ) and Z ( n ) denote the 

mean-removed LSF vectors at frame n . The prediction residual vectors r ( n ) and r ( n ) are given by: 

r^^^(n)=z<^^(n)-p(n), and 

(21) 

where p(n) is the predicted LSF vector at frame n . First order moving-average (MA) prediction is used where: 

p(n)=Q.65r^^\n-l) (22) 

where r ( n — 1) is the quantified second residual vector at the past frame. 
The two LSF residual vectors r and r are jointly quantified using split matrix quantization (SMQ). The matrix 



jr r is split into 5 submatrices of dimension 2x2 (two elements from each vector). For example, the first 

submatrix consists of the elements /] ,r2 ,/] , and r2 . The 5 submatrices are quantified with 7, 8, 8h-1, 
and 6 bits, respectively. The third submatrix uses a 256-entry signed codebook (8-bit index plus 1-bit sign). 
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A weighted LSP distortion measure is used in the quantization process. In general, for an input LSP vector f and a 
quantified vector at index k , t , the quantization is performed by finding the index k which minimizes: 



ELSP=T.[f.^i-fi 

The weighting factors W,-,Z = 1,. . .,10 , are given by 



>,. 



(23) 



1547 

W: = 3.347 d: for d: < 450, 

450 
8 

= 1.8 '- — (di - 450) otherwise, 

1050^ ' 

(24) 

where dj — fl_^.l — fi_i with /q = and f\\— 4000 . Here, two sets of weighting coefficients are computed for the 

two LSF vectors. In the quantization of each submatrix, two weighting coefficients from each set are used with their 
corresponding LSFs. 

5.2.6 Interpolation of the LSPs 

The two sets of quantified (and unquantized) LP parameters are used for the second and fourth subframes whereas the 
first and third subframes use a linear interpolation of the parameters in the adjacent subframes. The interpolation is 

performed on the LSPs in the q domain. Let q^ be the LSP vector at the 4th subframe of the present frame n , 

qk be the LSP vector at the 2nd subframe of the present frame n , and qj^ the LSP vector at the 4th subframe 

of the past frame n—\ . The interpolated LSP vectors at the 1st and 3rd subframes are given by: 

q("^ = 0.5q^"-l^ + 0.5q(,"^, 

(25) 

q^3"^ = 0.5q^2"^ + 0.5qi"^. 

The interpolated LSP vectors are used to compute a different LP filter at each subframe (both quantified and 
unquantized coefficients) using the LSP to LP conversion method described in clause 5.2.4. 



5.3 Open-loop pitch analysis 



Open-loop pitch analysis is performed twice per frame (each 10 ms) to find two estimates of the pitch lag in each frame. 
This is done in order to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags 
around the open-loop estimated lags. 

Open-loop pitch estimation is based on the weighted speech signal Sy^( n ) which is obtained by filtering the input 

A/ - /y J 

speech signal through the weighting filter W( Z ) = . That is, in a subframe of size L , the weighted speech 

A(z/y2) 

is given by: 

10 10 

s^(n) = s(n) + 2]aiy\s(n-i)-2_^aiy2Sw(n-i), n = 0,...,L-l. (26) 

i=l i=l 
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Open-loop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation: 

79 

<^;t = S*w("K("-^) (27) 

«=0 

are found in the three ranges: 



i = 3: 


18,. 


.,35, 


i = l: 


36,. 


..,71, 


i = \: 


72,. 


.,143 



The retained maxima Of ,Z=1, ...,3 , are normalized by dividing by a/ /^ S^in — ti\ Z = 1,. ..,3 , respectively. The 

normalized maxima and corresponding delays are denoted by (M^ , t^ j, Z = 1, . . . ,3 . The winner, T^^ , among the three 

normalized correlations is selected by favouring the delays with the values in the lower range. This is performed by 
weighting the normalized correlations corresponding to the longer delays. The best open-loop delay 7^„ is determined 

as follows: 

ifM2>Q.S5M(T,p) 

m[Top) = M^ 

Top = h 
end 

z7M3>0.85M(r„^) 



end 



'^op - h 



This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing 
pitch multiples. 

5.4 Impulse response computation 

The impulse response, /z( n j , of the weighted synthesis filter H( z JW( z) = A(z/y\)/ A( z )A( z/y2 ) is 

computed each subframe. This impulse response is needed for the search of adaptive and fixed codebooks. The impulse 
response h( n ) is computed by filtering the vector of coefficients of the filter A( z/ji ) extended by zeros through 

the two filters l/A(z) and l/A(z/y2)- 

5.5 Target signal computation 

The target signal for adaptive codebook search is usually computed by subtracting the zero input response of the 



weighted synthesis filter H( z )W( z)=A( z/ji)/ A( z)A( Z/J2) 
This is performed on a subframe basis. 



from the weighted speech signal S^( n ) 
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An equivalent procedure for computing the target signal, which is used in the present document, is the filtering of the 
LP residual signal reSj^pin) through the combination of synthesis filter \/ A(z) and the weighting filter 
A(z/yi)/A(z/y2)- After determining the excitation for the sub frame, the initial states of these filters are updated 
by filtering the difference between the LP residual and excitation. The memory update of these filters is explained in 
clause 5.9. 

The residual signal reSj^p{n) which is needed for finding the target vector is also used in the adaptive codebook 

search to extend the past excitation buffer. This simplifies the adaptive codebook search procedure for delays less than 
the subframe size of 40 as will be explained in the next clause. The LP residual is given by: 

10 
resip{n) = s{n)-<r2_^ais{n — i). (28) 

/=1 



5.6 Adaptive codebook search 



Adaptive codebook search is performed on a subframe basis. It consists of performing closed-loop pitch search, and 
then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag. 

The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive 
codebook approach for implementing the pitch filter, the excitation is repeated for delays less than the subframe length. 
In the search stage, the excitation is extended by the LP residual to simplify the closed-loop search. 



In the first and third subframes, a fractional pitch delay is used with resolutions: 1/6 in the range 



3 3 

17-, 94- 

6 6 



and 



integers only in the range [95, 143]. For the second and fourth subframes, a pitch resolution of 1/6 is always used in the 



range 



^ 6 ^ 6 



, where T^ is nearest integer to the fractional pitch lag of the previous (1st or 3rd) 



subframe, bounded by 18... 143. 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and 
third) subframe the range 7^„ ±3 , bounded by 18... 143, is searched. For the other subframes, closed-loop pitch 

analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is 
encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term: 

i39 



E„=0-^(")^'t(«) 



where x( n ) is the target signal and yi^(n) is the past filtered excitation at delay k (past excitation convolved with 
h( n )). Note that the search range is limited around the open-loop pitch as explained earlier. 

The convolution y]^(n) is computed for the first delay f„,„ in the searched range, and for the other delays in the search 
range k = t^-^ + 1, . . . , t^^ , it is updated using the recursive relation: 

yk(n)^yk-\(n-\)+u(-k)h(n), (30) 

where u(n),n — — (\'X?>+\\),... ,39 , is the excitation buffer. Note that in search stage, the samples 

u( n ),n — 0,... ,39 , are not known, and they are needed for pitch delays less than 40. To simplify the search, the LP 

residual is copied to u( n ) in order to make the relation in equation (30) valid for all delays. 
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Once the optimum integer pitch delay is determined, the fractions from - to -g with a step of ^ around that integer are 

tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (29) and 
searching for its maximum. The interpolation is performed using an FIR filter Z?24 based on a Hamming windowed 

sin(j;;)/x function truncated at ± 23 and padded with zeros at + 24 (Z?24 l^^) = ). The filter has its cut-off frequency 

(-3 dB) at 3 600 Hz in the over-sampled domain. The interpolated values of R{k) for the fractions - to -g are 

obtained using the interpolation formula: 

3 3 

R{k)^ = Y,R{k-i)b24{t + i-6) + YjR{k + l + i)b24{6-t + i-6), t = 0,...,5, (31) 

where t — 0,...,5 corresponds to the fractions 0, -^ , ^ , -g , - ^ , and - -r , respectively. Note that it is necessary to 
compute the correlation terms in equation (29) using a range t^^^ — 4, t^^-^ + 4, to allow for the proper interpolation. 

Once the fractional pitch lag is determined, the adaptive codebook vector v( n ) is computed by interpolating the past 
excitation signal u( n ) at the given integer delay k and phase (fraction) t : 

9 9 

v{n) = '^u{n-k- i) b^Q{t + i-6) + 'Y^u{n-k + l + i) bf,Q{6-t + i-6), n = 0,... ,39, t = 0,...,5. (32) 

The interpolation filter Z7gQ is based on a Hamming windowed sin( jc)/ JC function truncated at ± 59 and padded with 
zeros at ± 60 ( b(^Q (60) = ). The filter has a cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found by: 

^39 



x(n)y(n) 
_ w=0 



^39 



bounded by 0<gp<1.2, 



(33) 



A r,y(n)y(n) 

where y( n ) — v( n )*h( n ) is the filtered adaptive codebook vector (zero state response of H( z )W( z ) to v( n )). 
The computed adaptive codebook gain is quantified using 4-bit non-uniform scalar quantization in the range [0.0,1.2]. 



5.7 Algebraic codebook structure and search 

The algebraic codebook structure is based on interleaved single -pulse permutation (ISPP) design. In this codebook, the 
innovation vector contains 10 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 positions in a 
subframe are divided into 5 tracks, where each track contains two pulses, as shown in table 2. 

Table 2: Potential positions of individual pulses in the algebraic codebook 



Track 


Pulse 


positions 


1 


io. 15 


0,5, 10, 15,20,25,30,35 


2 


il.i6 


1,6, 11, 16,21, 26,31, 36 


3 


12,17 


2,7, 12, 17,22,27,32,37 


4 


i3>i8 


3,8, 13, 18,23,28,33,38 


5 


14,19 


4,9, 14, 19,24,29,34,39 



Each two pulse positions in one track are encoded with 6 bits (total of 30 bits, 3 bits for the position of every pulse), and 
the sign of the first pulse in the track is encoded with 1 bit (total of 5 bits). 
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For two pulses located in the same track, only one sign bit is needed. This sign bit indicates the sign of the first pulse. 
The sign of the second pulse depends on its position relative to the first pulse. If the position of the second pulse is 
smaller, then it has opposite sign, otherwise it has the same sign than in the first pulse. 

All the 3-bit pulse positions are Gray coded in order to improve robustness against channel errors. This gives a total of 
35 bits for the algebraic code. 

The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the 
weighted synthesized speech. The target signal used in the closed-loop pitch search is updated by subtracting the 
adaptive codebook contribution. That is: 

X2(n)^x(n)-gpy(n), n = 0,...,39, (34) 

where y( n ) — v( n )*h( n ) is the filtered adaptive codebook vector and §„ is the quantified adaptive codebook gain. 
If Cj^ is the algebraic codevector at index k , then the algebraic codebook is searched by maximizing the term: 

Ak = ^^^^ = —, — (35) 

where d = H X2 is the correlation between the target signal -fjfnj and the impulse response h(n), H is a the 
lower triangular Toepliz convolution matrix with diagonal ^(Oj and lower diagonals hylj, ..., /?(39j , and 

O = H H is the matrix of correlations of hirn . The vector d (backward filtered target) and the matrix O are 
computed prior to the codebook search. The elements of the vector d are computed by 

39 
d(n)=y\x2(i)Hi-n), n = 0,...,39, (36) 

i=n 

and the elements of the symmetric matrix O are computed by: 

39 



¥hj)^j]h(n-i)h(n-j), (j>i). (37) 



n=j 

The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector C^^ contains 
only a few nonzero pulses. The correlation in the numerator of Equation (35) is given by: 

Np-1 



C= ^z^,J(m,) (38) 



1=0 



where ni^ is the position of the / th pulse, l^j is its amplitude, and N „ is the number of pulses (A'^, = 10 ). The 
energy in the denominator of equation (35) is given by: 



Np-1 Np-2 N^-l 



1 = 1=0 j = i+l 
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To simplify the search procedure, the pulse amplitudes are preset by the mere quantization of an appropriate signal. In 
this case the signal b( n ) , which is a sum of the normalized d(n) vector and normalized long-term prediction residual 

b{n) = - ^^^^ ^ + ^ ^ ^, n = 0,...,39, (40) 

is used. This is simply done by setting the amplitude of a pulse at a certain position equal to the sign of b( n ) at that 

position. The simplification proceeds as follows (prior to the codebook search). First, the sign signal 

Siy(n) = sign[b(n)] and the signal d (n) = d(n)S}y(n) are computed. Second, the matrix O is modified by 

including the sign information; that is, (p (i,j) = 5^ (0'^^ (j)'P(Uj) ■ The correlation in equation (38) is now given by: 

C= ^d(mi) (41) 

j=0 

and the energy in equation (39) is given by: 

E^=Y,(I> (m. ,m, ) + 2 ^ J] (j) (m. ,m . ). (42) 

i=0 (=0 >=i+l 

Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient 
non-exhaustive analysis-by-synthesis search technique. In this technique, the term in equation (35) is tested for a small 
percentage of position combinations. 

First, for each of the five tracks the pulse positions with maximum absolute values of b(n) are searched. From these 

the global maximum value for all the pulse positions is selected. The first pulse iO is always set into the position 
corresponding to the global maximum value. 

Next, four iterations are carried out. During each iteration the position of pulse il is set to the local maximum of one 
track. The rest of the pulses are searched in pairs by sequentially searching each of the pulse pairs {i2,i3}, {i4,i5}, 
{16,17} and {18,19} in nested loops. Every pulse has 8 possible positions, i.e., there are four 8x8-loops, resulting in 256 
different combinations of pulse positions for each iteration. 

In each iteration all the 9 pulse starting positions are cyclically shifted, so that the pulse pairs are changed and the pulse 
il is placed in a local maximum of a different track. The rest of the pulses are searched also for the other positions in the 
tracks. At least one pulse is located in a position corresponding to the global maximum and one pulse is located in a 
position corresponding to one of the 4 local maxima. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
Fg(z) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fg(z) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 
subframe, and (3 is a pitch gain. In the present document, (3 is given by the quantified pitch gain bounded by [0.0,1.0]. 
Note that prior to the codebook search, the impulse response h( n ) must include the pre-filter F^ (z) ■ That is, 
h(n)=h(n)+^h(n-T), n = T,... ,39. 



The fixed codebook gain is then found by: 



_X^Z 



, (43) 

z z 



where X2 is the target vector for fixed codebook search and Z is the fixed codebook vector convolved with h( n ) , 
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z{n) = 2_jC{i)h[n-i), n = 0,...,39. 

1=0 



(44) 



5.8 Quantization of the fixed codebook gain 

The fixed codebook gain quantization is performed using MA prediction with fixed coefficients. The 4th order MA 
prediction is performed on the innovation energy as follows. Let E( n j be the mean-removed innovation energy (in 
dB) at subframe n , and given by: 



E(n )=lO\og 



i=0 J 



E, 



(45) 



where A^ = 40 is the subframe size, c( i ) is the fixed codebook excitation, and E =36 dB is the mean of the 
innovation energy. The predicted energy is given by: 



E(n) = Y,biR(n-i), 



(46) 



i=l 



where [^ ^ ^ l?4] = [0.68 0.58 0.34 0.19] are the MA prediction coefficients, and R( k ) is the quantified 

prediction error at subframe k . The predicted energy is used to compute a predicted fixed-codebook gain g^ as in 

equation (45) (by substituting E( n ) by E( n ) and g^ by g^ ). This is done as follows. First, the mean innovation 
energy is found by: 



£/ = 101og 



r , N-i ^ 

V^ j=0 J 



(47) 



and then the predicted gain g^ is found by: 



0.05(£'(n)+£'-£;) 



A correction factor between the gain g^ and the estimated one g^ is given by: 

y %c ~ 6c ' &c ■ 

Note that the prediction error is given by: 

R{n) = E(n) - E(n) = 20 log (/g, ). 



(48) 



(49) 



(50) 



The correction factor y „^, is quantified using a 5-bit codebook. The quantization table search is performed by 
minimizing the error: 



' \2 



EQ={gc-YgcSc) ■ 

Once the optimum value y is chosen, the quantified fixed codebook gain is given by g^ = y „^ g 



(51) 



gc 



C I gc oc ■ 
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5.9 Memory update 

An update of the states of the synthesis and weighting fihers is needed in order to compute the target signal in the next 
subframe. 

After the two gains are quantified, the excitation signal, u( n ),m the present subframe is found by; 

n(n)=gpV(n)+gcC(n), n = 0,...,39, (52) 

where g„ and g^ are the quantified adaptive and fixed codebook gains, respectively, v( n ) the adaptive codebook 

vector (interpolated past excitation), and c( n ) is the fixed codebook vector (algebraic code including pitch 
sharpening). The states of the filters can be updated by filtering the signal reSj^p (n) — u(n) (difference between 

residual and excitation) through the filters 1/ A( z ) and A(z/yi)/A(z/y2) ^'^^ ^^ 40-sample subframe and 
saving the states of the filters. This would require 3 filterings. A simpler approach which requires only one filtering is as 
follows. The local synthesized speech, s( n ) ,is computed by filtering the excitation signal through 1/ A( z )■ The 
output of the filter due to the input res^p (n)- u{n) is equivalent to e( n )- s( n )— s( n ) . So \h& states of the 

synthesis filter 1/ A( z ) are given by e( n),n = 30, . . . ,39 . Updating the states of the filter A( z/ji )/A( Z/J2) 
can be done by filtering the error signal e( n ) through this filter to find the perceptually weighted error e^( n ) . 
However, the signal e^( n ) can be equivalently found by: 

eJn)^x(n)-gpy(n)-gcZ(n). (53) 

Since the signals x( n ),y( n ) , and z( n ) are available, the states of the weighting filter are updated by computing 
e^f n ) as in equation (53) for n = 30, . . .,39 . This saves two filterings. 



Functional description of the decoder 



The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, 
adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the 
reconstructed speech. The reconstructed speech is then post-filtered and upscaled. The signal flow at the decoder is 
shown in figure 4. 

6.1 Decoding and speech synthesis 

The decoding process is performed in the following order: 

Decoding of LP filter parameters: The received indices of LSP quantization are used to reconstruct the two quantified 
LSP vectors. The interpolation described in clause 5.2.6 is performed to obtain 4 interpolated LSP vectors 
(corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient 
domain a^ , which is used for synthesizing the reconstructed speech in the subframe. 

The following steps are repeated for each subframe: 

1) Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) is used to find 
the integer and fractional parts of the pitch lag. The adaptive codebook vector v(n) is found by interpolating the 
past excitation u(n) (at the pitch delay) using the FIR filter described in clause 5.6. 

2) Decoding of the adaptive codebook gain: The received index is used to readily find the quantified adaptive 
codebook gain, gp from the quantization table. 
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3) Decoding of the innovative codebook vector: The received algebraic codebook index is used to extract the 
positions and amplitudes (signs) of the excitation pulses and to find the algebraic codevectorcfnj . If the integer 
part of the pitch lag is less than the subframe size 40, the pitch sharpening procedure is applied which translates 
into modifying c(n) by c(n) — c(n) + (3c(n — T) , where P is the decoded pitch gain, gp , bounded 

by [0.0,1.0]. 

4) Decoding of the fixed codebook gain: The received index gives the fixed codebook gain correction factor y „^ . 

The estimated fixed codebook gain g^ is found as described in clause 5.7. First, the predicted energy is found 
by: 



E(n) = j]biR(n-i) 



(54) 



i=l 



and then the mean innovation energy is found by: 



/ 



10 log 



1 



N-l 



\ 



V 7=0 



(55) 



The predicted gain g^ is found by: 



' ^-,r.0.05(E(n)+E-Ej) 
Sc " 



(56) 



The quantified fixed codebook gain is given by: 

8c = Ygc Sc (57) 

5) Computing the reconstructed speech: The excitation at the input of the synthesis filter is given by: 

u(n) = gpV(n) + gcC(n) (58) 

Before the speech synthesis, a post-processing of excitation elements is performed. This means that the total 
excitation is modified by emphasizing the contribution of the adaptive codebook vector: 



u(n) = 



\u(n) + 0.25 ^gpv(n), ^„ > 0.5 



u(n), 



gp< 0.5 



(59) 



Adaptive gain control (AGC) is used to compensate for the gain difference between the non-emphasized 
excitation u{n) and emphasized excitation U[n) The gain scaling factor 7] for the emphasized excitation is 
computed by: 

gn > 0.5 



ri 



Z39 9 
n=Q ' ^ 



.39 



1.0 ^^<0.5 



(60) 



The gain-scaled emphasized excitation signal u'(n) is given by: 

u'(n) = u(n)r\ 
The reconstructed speech for the subframe of size 40 is given by: 



(61) 
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10 

s(n) = u' (n) - Y, aiS(n -i), n = 0,... ,39. (62) 

1=1 

where a^ are the interpolated LP fiher coefficients. 
The synthesized speech s(n) is then passed through an adaptive postfiher which is described in the following clause. 

6.2 Post-processing 

Post-processing consists of two functions; adaptive post-filtering and signal up-scaling. 

6.2.1 Adaptive post-filtering 

The adaptive postfilter is the cascade of two filters; a formant postfilter, and a tilt compensation filter. The postfilter is 
updated every sub frame of 5 ms. 

The formant postfilter is given by; 

H.M^^^i^ ,63, 

where A(z) is the received quantified (and interpolated) LP inverse filter (LP analysis is not performed at the 
decoder), and the factors '/n '^^'^ Id control the amount of the formant post-filtering. 

Finally, the filter Hj(z) compensates for the tilt in the formant postfilter Hf(z) and is given by; 

H,(z) = (\-\:iz-^) (64) 

where // = J fk^ is a tilt factor, with k^ being the first reflection coefficient calculated on the truncated 
(L/j = 22 j impulse response, /z^fnj , of the filter Afz/y^j/Afz/y^j . k^ is given by; 

^i' = ^; 'h^^= E^/0>/a+o (65) 

The post-filtering process is performed as follows. First, the synthesized speech s(n) is inverse filtered through 
A(z/yyi) to produce the residual signal r(n) . The signal r(n) is filtered by the synthesis filter l/Afz/y^ j . 

Finally, the signal at the output of the synthesis filter \/ A(z/y d) is passed to the tilt compensation filter hf(z) 
resulting in the post- filtered speech signal Sf(n) . 

Adaptive gain control (AGC) is used to compensate for the gain difference between the synthesized speech signal s(n) 
and the post-filtered signal Sf(n) . The gain scaling factor y ^^ for the present subframe is computed by; 



Tsc 



39 



n=0 



39 

S 

11=0 



(66) 



I^SJin) 
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The gain-scaled post-filtered signal s'(n) is given by: 

s'(n) = j3^^(n)Sf(n) (67) 

where /3 ^^.{n) is updated in sample-by-sample basis and given by: 

P^^in) = afi^^in-\) + i\-a)y^^ (68) 

where OC is a AGC factor with value of 0.9. 

The adaptive post-filtering factors are given by: 7n = 0.7 , J j = 0.75 and 

f 0.8, k{>0 



Yt=\ ■ (69) 

0, otherwise 



6.2.2 Up-scaling 



Up-scaling consists of multiplying the post-filtered speech by a factor of 2 to compensate for the down-scaling by 2 
which is applied to the input signal. 



7 Variables, constants and tables in the C-code of the 

GSM EFR codec 

The various components of the 12,2 kbit/s GSM enhanced full rate codec are described in the form of a fixed-point 
bit-exact ANSI C code, which is found in GSM 06.53 [6]. This C simulation is an integrated software of the speech 
codec, VAD/DTX, comfort noise and bad frame handler functions. In the fixed-point ANSI C simulation, all the 
computations are performed using a predefined set of basic operators. 

Two types of variables are used in the fixed-point implementation. These two types are signed integers in 2's 
complement representation, defined by: 

Wordie 16 bit variables 

Words 2 32 bit variables 

The variables of the Wordl6 type are denoted varl, var2,..., varn, and those of type Word32 are denoted L_varl, 

L_var2,..., L_vam. 

7.1 Description of the constants and variables used in the C 
code 

The ANSI C code simulation of the codec is, to a large extent, self-documented. However, a description of the variables 
and constants used in the code is given to facilitate the understanding of the code. The fixed-point precision (in terms of 
Q format, double precision (DP), or normalized precision) of the vectors and variables is given, along with the vectors 
dimensions and constant values. 

Table 3 gives the coder global constants and table 4 describes the variables and vectors used in the encoder routine with 
their precision. Table 5 describes the fixed tables in the codec. 
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Table 3: Codec global constants 



Parameter 


Value 


Description 


L TOTAL 


240 


size of speech buffer 


L WINDOW 


240 


size of LP analysis window 


L FRAME 


160 


size of speech frame 


L FRAME BY2 


80 


half the speech frame size 


L SUBFR 


40 


size of subframe 


M 


10 


order of LP analysis 


MP1 


11 


M+1 


AZ SIZE 


44 


4*M+4 


PIT MAX 


143 


maximum pitch lag 


PIT MIN 


18 


minimum pitch lag 


LJNTERPOL 


10 


order of sine filter for interpolating 
the excitations is 2*L INTERP0L*6+1 


PRM SIZE 


57 


size of vector of analysis parameters 


SERIAL SIZE 


245 


number of speech bits + bfi 


MU 


26214 


tilt compensation filter factor (0.8 in 015) 


AGC FAC 


29491 


automatic gain control factor (0.9 in 015) 



Table 4: Description of the coder vectors and variables 



Parameter 


Size 


Precision 


Description 


speech 


-80.. 159 


OO 


speech buffer 


wsp 


-143.. 159 


oo 


weighted speech buffer 


exc 


-(143+11). .159 


OO 


LP excitation 


F_gamma1 


0..9 


015 


spectral expansion factors 


F_gamma2 


0..9 


015 


spectral expansion factors 


lsp_old 


0..9 


015 


LSP vector in past frame 


lsp_old_q 


0..9 


015 


quantified LSP vector in past frame 


mem_syn 


0..9 


OO 


memory of synthesis filter 


mem_w 


0..9 


OO 


memory of weighting filter (applied to input) 


mem_wO 


0..9 


OO 


memory of weighting filter (applied to error) 


error 


-10. .39 


OO 


error signal (input minus synthesized speech) 


r 1 &r h 


0..10 


normalized DP 


correlations of windowed speech (low and hi) 


A t 


11x4 


012 


LP filter coefficients in 4 subframes 


AqJ 


11x4 


012 


quantified LP filter coefficients in 
4 subframes 


Apl 


0..10 


012 


LP coefficients with spectral expansion 


Ap2 


0..10 


012 


LP coefficients with spectral expansion 


lsp_new 


0..9 


015 


LSP vector in 4th subframe 


lsp_new_q 


0..9 


015 


quantified LSP vector in 4th subframe 


lsp_mid 


0..9 


015 


LSP vector in 2nd subframe 


lsp_mid_q 


0..9 


015 


quantified LSP vector in 2nd subframe 


code 


0..39 


012 


fixed codebook excitation vector 


hi 


0..39 


012 


impulse response of weighted synthesis filter 


xn 


0..39 


OO 


target vector in pitch search 


xn2 


0..39 


OO 


target vector in algebraic codebook search 


dn 


0..39 


scaled max 
<8192 


backward filtered target vector 


yi 


0..39 


OO 


filtered adaptive codebook vector 


y2 


0..39 


012 


filtered fixed codebook vector 


zero 


0..39 




zero vector 


res2 


0..39 




long-term prediction residual 


gain_pit 


scalar 


012 


adaptive codebook gain 


gain code 


scalar 


OO 


algebraic codebook gain 
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Table 5: Codec fixed tables 



Parameter 


Size 


Precision 


Description 


grid[] 


61 


Q15 


grid points at which Chebyshev polynomials are evaluated 


lag_h [ ] and lag_1 [ ] 


10 


DP 


higher and lower parts of the lag window table 


window 160 80 [] 


240 


Q15 


1st LP analysis window 


window 232 8 [ ] 


240 


Q15 


2nd LP analysis window 


table [ ] in Lsfjsp ( ) 


65 


Q15 


table to compute cos(x) in Lsfjsp ( ) 


slope [ ] in Lspjsf ( ) 


64 


Q12 


table to compute acos(x) in LSPJsf ( ) 


table [ ] in lnv_sqrt ( ) 


49 




table used in inverse square root computation 


table [ ] in Log2 ( ) 


33 




table used in base 2 logarithm computation 


table [ ] in Pow2 ( ) 


33 




table used in 2 to the power computation 


mean Isf [ ] 


10 


Q15 


LSF means in normalized frequency [0.0, 0.5] 


dico1 lsf[] 


128x4 


Q15 


1st LSF quantizer in normalized frequency [0.0, 0.5] 


dico2_lsf [ ] 


256x4 


Q15 


2nd LSF quantizer in normalized frequency [0.0, 0.5] 


dicoSJsf [ ] 


256x4 


Q15 


3rd LSF quantizer in normalized frequency [0.0, 0.5] 


dico4_lsf [ ] 


256x4 


Q15 


4th LSF quantizer in normalized frequency [0.0, 0.5] 


dicoSJsf [ ] 


64x4 


Q15 


5th LSF quantizer in normalized frequency [0.0, 0.5] 


qua_gain_pitch [ ] 


16 


Q14 


quantization table of adaptive codebool< gain 


qua_gain_code [ ] 


32 


Q11 


quantization table of fixed codebook gain 


inter 6 [ ] in Interpol 6 ( ) 


25 


Q15 


interpolation filter coefficients in lnterpol_6 ( ) 


inter 6 [ ] in Fred It 6 ( ) 


61 


Q15 


interpolation filter coefficients in Pred It 6 ( ) 


b[] 


3 


Q12 


HP filter coefficients (numerator) in Pre_Process ( ) 


a[] 


3 


Q12 


HP filter coefficients (denominator) in Pre_Process ( ) 


bitno [ ] 


57 


QO 


number of bits corresponding to transmitted parameters 



Table 6: Source Encoder output parameters in order of occurrence 
and bit allocation within the speech frame of 244 bits/20 ms 



Bits (MSB-LSB) 


Description 


si -s7 


index of 1st LSF submatrix 


s8-s15 


index of 2nd LSF submatrix 


S16-S23 


index of 3rd LSF submatrix 


s24 


sign of 3rd LSF submatrix 


s25 - s32 


index of 4th LSF submatrix 


s33 - s38 


index of 5th LSF submatrix 


subframe 1 


s39 - s47 


adaptive codebool< index 


s48 - s51 


adaptive codebook gain 


s52 


sign information for 1st and 6th pulses 


s53 - s55 


position of 1st pulse 


s56 


sign information for 2nd and 7th pulses 


s57 - s59 


position of 2nd pulse 


s60 


sign information for 3rd and 8th pulses 


s61 - s63 


position of 3rd pulse 


s64 


sign information for 4th and 9th pulses 


s65 - s67 


position of 4th pulse 


s68 


sign information for 5th and 10th pulses 


s69 - s71 


position of 5th pulse 


s72 - s74 


position of 6th pulse 


s75 - s77 


position of 7th pulse 


s78 - s80 


position of 8th pulse 


s81 - s83 


position of 9th pulse 


s84 - s86 


position of 10th pulse 


s87 - s91 


fixed codebook gain 


subframe 2 


s92 - s97 


adaptive codebook index (relative) 


S98-S141 


same description as s48 - s91 


subframe 3 


s142-s194 


same description as s39 - s91 


subframe 4 


S195-S244 


same description as s92 - si 41 



£75/ 



3GPP TS 46.060 version 11.0.0 Release 11 33 ETSI TS 146 060 V11.0.0 (2012-10) 

8 Homing sequences 

8.1 Functional description 

The enhanced full rate speech codec is described in a bit-exact arithmetic to allow for easy type approval as well as 
general testing purposes of the enhanced full rate speech codec. 

The response of the codec to a predefined input sequence can only be foreseen if the internal state variables of the codec 
are in a predefined state at the beginning of the experiment. Therefore, the codec has to be put in a so called home state 
before a bit-exact test can be performed. This is usually done by a reset (a procedure in which the internal state 
variables of the codec are set to their defined initial values). 

To allow a reset of the codec in remote locations, special homing frames have been defined for the encoder and the 
decoder, thus enabling a codec homing by inband signalling. 

The codec homing procedure is defined in such a way, that in either direction (encoder or decoder) the homing 
functions are called after processing the homing frame that is input. The output corresponding to the first homing frame 
is therefore dependent on the codec state when receiving that frame and hence usually not known. The response to any 
further homing frame in one direction is by definition a homing frame of the other direction. This procedure allows 
homing of both, the encoder and decoder from either side, if a loop back configuration is implemented, taking proper 
framing into account. 

8.2 Definitions 

Encoder homing frame: The encoder homing frame consists of 160 identical samples, each 13 bits long, with the least 
significant bit set to "one" and all other bits set to "zero". When written to 16-bit words with left justification, the 
samples have a value of 0008 hex. The speech decoder has to produce this frame as a response to the second and any 
further decoder homing frame if at least two decoder homing frames were input to the decoder consecutively. 

Decoder homing frame: The decoder homing frame has a fixed set of speech parameters as described in table?. It is 
the natural response of the speech encoder to the second and any further encoder homing frame if at least two encoder 
homing frames were input to the encoder consecutively. 
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Table?: Parameter values for the decoder homing frame 



Parameter 


Value 




(LSB=bO) 


LPC1 


0x0004 


LPC2 


0x002F 


LPC3 


0x00B4 


LPC4 


0x0090 


LPC5 


0x003E 


LIP-LAG 1 


0x0156 


LIP-LAG 2 


0x0036 


LIP-LAG 3 


0x0156 


LIP-LAG 4 


0x0036 


LIP-GAIN 1 


OxOOOB 


LIP-GAIN 2 


0x0001 


LIP-GAIN 3 


0x0000 


LIP-GAIN 4 


OxOOOB 


FOB-GAIN 1 


0x0003 


FOB-GAIN 2 


0x0000 


FOB-GAIN 3 


0x0000 


FOB-GAIN 4 


0x0000 


PULSE 1 1 


0x0000 


PULSE 1 2 


0x0001 


PULSE 1 3 


OxOOOF 


PULSE 1 4 


0x0001 


PULSE 1 5 


OxOOOD 


PULSE 1 6 


0x0000 


PULSE 1 7 


0x0003 


PULSE 1 8 


0x0000 


PULSE 1 9 


0x0003 


PULSE 1 10 


0x0000 


PULSE 2 1 


0x0008 


PULSE 2 2 


0x0008 


PULSE 2 3 


0x0005 


PULSE 2 4 


0x0008 


PULSE 2 5 


0x0001 


PULSE 2 6 


0x0000 


PULSE 2 7 


0x0000 


PULSE 2 8 


0x0001 


PULSE 2 9 


0x0001 


PULSE 2 10 


0x0000 


PULSE 3 1 


0x0000 


PULSE 3 2 


0x0000 


PULSE 3 3 


0x0000 


PULSE 3 4 


0x0000 


PULSE 3 5 


0x0000 


PULSE 3 6 


0x0000 


PULSE 3 7 


0x0000 


PULSE 3 8 


0x0000 


PULSE 3 9 


0x0000 


PULSE 3 10 


0x0000 


PULSE 4 1 


0x0000 


PULSE 4 2 


0x0000 


PULSE 4 3 


0x0000 


PULSE 4 4 


0x0000 


PULSE 4 5 


0x0000 


PULSE 4 6 


0x0000 


PULSE 4 7 


0x0000 


PULSE 4 8 


0x0000 


PULSE 4 9 


0x0000 


PULSE 4 10 


0x0000 
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8.3 Encoder homing 



Whenever the enhanced full rate speech encoder receives at its input an encoder homing frame exactly aligned with its 
internal speech frame segmentation, the following events take place: 

Step 1: The speech encoder performs its normal operation including VAD and DTX and produces a 

speech parameter frame at its output which is in general unknown. But if the speech encoder was 
in its home state at the beginning of that frame, then the resulting speech parameter frame is 
identical to the decoder homing frame (this is the way how the decoder homing frame was 
constructed). 

Step 2: After successful termination of that operation the speech encoder provokes the homing functions 

for all sub-modules including VAD and DTX and sets all state variables into their home state. On 
the reception of the next input frame, the speech encoder will start from its home state. 

NOTE: Applying a sequence of N encoder homing frames will cause at least N-1 decoder homing frames at the 
output of the speech encoder. 

8.4 Decoder homing 

Whenever the speech decoder receives at its input a decoder homing frame, then the following events take place: 

Step 1: The speech decoder performs its normal operation and produces a speech frame at its output which 

is in general unknown. But if the speech decoder was in its home state at the beginning of that 
frame, then the resulting speech frame is replaced by the encoder homing frame. This would not 
naturally be the case but is forced by this definition here. 

Step 2: After successful termination of that operation the speech decoder provokes the homing functions 

for all sub-modules including the comfort noise generator and sets all state variables into their 
home state. On the reception of the next input frame, the speech decoder will start from its home 

state. 

NOTE 1: Applying a sequence of N decoder homing frames will cause at least N-1 encoder homing frames at the 
output of the speech decoder. 

NOTE 2: By definition (!) the first frame of each decoder test sequence must differ from the decoder homing frame 
at least in one bit position within the parameters for LPC and first subframe. Therefore, if the decoder is 
in its home state, it is sufficient to check only these parameters to detect a subsequent decoder homing 
frame. This definition is made to support a delay-optimized implementation in the TRAU uplink 
direction. 
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8.5 



Encoder home state 



In table 8, a listing of all the encoder state variables with their predefined values when in the home state is given. 

Table 8: Initial values of the encoder state variables 



File 


Variable 


Initial value 


cod_1 2k2.c 


old_speech[0:319] 


All set to 




old_exc[0:153] 


All set to 




old_wsp[0:142] 


All set to 




mem_syn[0:9] 


All set to 




mem_w[0:9] 


All set to 




mem_w0[0:9] 


All set to 




mem_err[0:9] 


All set to 




ai_zero[11 :50] 


All set to 




hvec[0:39] 


All set to 




lsp_old[0], lsp_old_q[0] 


30000 




lsp_old[1], lsp_old_q[1] 


26000 




lsp_old[2], lsp_old_q[2] 


21000 




lsp_old[3], lsp_old_q[3] 


15000 




lsp_old[4], lsp_old_q[4] 


8000 




lsp_old[5], lsp_old_q[5] 







lsp_old[6], lsp_old_q[6] 


-8000 




lsp_old[7], lsp_old_q[7] 


-15000 




lsp_old[8], lsp_old_q[8] 


-21000 




Isp old[9], Isp old q[9] 


-26000 


levinson.c 


old A[0] 


4096 




old_A[1:10] 


All set to 


pre_proc.c 


y2_hi, y2_lo, y1_hi, y1_lo, 
x1, xO 


All set to 


q_plsf_5.c 


past_r2_q[0:9] 


All set to 


q_gains.c 


past_qua_en[0:3] 


All set to -2381 




pred[0] 


44 




pred[1] 


37 




pred[2] 


22 




pred[3] 


12 


dtx.c 


txdtx_hangover 


7 




txdtx_N_elapsed 


0x7fff 




txdtxctrl 


0x0003 




old_CN_mem_tx[0:5] 


All set to 




lsf_old_tx[0:6][0] 


1384 




lsf_old_tx[0:6][1] 


2077 




lsf_old_tx[0:6][2] 


3420 




lsf_old_tx[0:6][3] 


5108 




lsf_old_tx[0:6][4] 


6742 




lsf_old_tx[0:6][5] 


8122 




lsf_old_tx[0:6][6] 


9863 




lsf_old_tx[0:6][7] 


11092 




lsf_old_tx[0:6][8] 


12714 




lsf_old_tx[0:6][9] 


13701 




gain_code_old_tx[0:27] 


All set to 




L_pn_seed_tx 


0x70816958 




buf p tx 






Initial values for variables used by the VAD algorithm are listed in GSM 06.32 [4]. 
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8.6 



Decoder home state 



In table 9, a listing of all the decoder state variables with their predefined values when in the home state is given. 

Table 9: Initial values of the decoder state variables 



File 


Variable 


Initial value 


decoder.c 


synth_buf[0:9] 


All set to 


dec_1 2k2.c 


old_exc[0:153] 


All set to 




mem_syn[0:9] 


All set to 




lsp_old[0] 


30000 




lsp_old[1] 


26000 




lsp_old[2] 


21000 




lsp_old[3] 


15000 




lsp_old[4] 


8000 




lsp_old[5] 







lsp_old[6] 


-8000 




lsp_old[7] 


-15000 




lsp_old[8] 


-21000 




lsp_old[9] 


-26000 




prev_bf 







state 





agc.c 


past gain 


4096 


d_plsf_5.c 


past_r2_q[0:9] 


All set to 




past_lsf_q[0], lsf_p_CN[0], 


1384 




lsf_old_CN[0],lsf_new_CN[0] 






past Isf q[1], Isf p CN[1], 


2077 




lsf_old_CN[1 ],lsf_new_CN[1 ] 






past Isf q[2], Isf p CN[2], 


3420 




lsf_old_CN[2],lsf_new_CN[2] 






past_lsf_q[3], lsf_p_CN[3], 


5108 




lsf_old_CN[3],lsf_new_CN[3] 






past_lsf_q[4], lsf_p_CN[4], 


6742 




lsf_old_CN[4],lsf_new_CN[4] 






past_lsf_q[5], lsf_p_CN[5], 


8122 




lsf_old_CN[5],lsf_new_CN[5] 






past_lsf_q[6], lsf_p_CN[6], 


9863 




lsf_old_CN[6],lsf_new_CN[6] 






past Isf q[7], Isf p CN[7], 


11092 




lsf_old_CN[7],lsf_new_CN[7] 






past_lsf_q[8], lsf_p_CN[8], 


12714 




lsf_old_CN[8],lsf_new_CN[8] 






past_lsf_q[9], lsf_p_CN[9], 


13701 




Isf old CN[9],lsf new CN[9] 




d_gains.c 


pbuf[0:4] 


All set to 410 




gbuf[0:4] 


All set to 1 




past_gain_pit 







past_gain_code 







prev_gp 


4096 




prev_gc 


1 




gcodeO_CN 







gain_code_old_CN 







gain_code_new_CN 







gain_code_muting_CN 







past_qua_en[0:3] 


All set to -2381 




pred[0] 


44 




pred[1] 


37 




pred[2] 


22 




pred[3] 


12 




(continued) 
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Table 9 (concluded): Initial values of the decoder state variables 



File 


Variable 


Initial value 


dtx.c 


rxdtxaverperiod 


7 




rxdtx_N_elapsed 


0x7fff 




rxdtxctrl 


0x0001 




lsf_old_rx[0:6][0] 


1384 




lsf_old_rx[0:6][1] 


2077 




lsf_old_rx[0:6][2] 


3420 




lsf_old_rx[0:6][3] 


5108 




lsf_old_rx[0:6][4] 


6742 




lsf_old_rx[0:6][5] 


8122 




lsf_old_rx[0:6][6] 


9863 




lsf_old_rx[0:6][7] 


11092 




lsf_old_rx[0:6][8] 


12714 




lsf_old_rx[0:6][9] 


13701 




gain_code_old_rx[0:27] 


All set to 




L_pn_seed_rx 


0x70816958 




rx_dtx_state 


23 




prevSIDframesJost 







buf p rx 





dec Iag6.c 


old TO 


40 


preemph.c 


mem pre 





pstfilt2.c 


mem_syn_pst[0:9] 


All set to 




res2[0:39] 


All set to 
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Figure 2: Simplified block diagram of the CELP synthesis model 
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Figure 3: Simplified block diagram of the GSM enhanced full rate encoder 
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£75/ 



3GPP TS 46.060 version 1 1 .0.0 Release 1 1 42 ETSI TS 1 46 060 V1 1 .0.0 (201 2-1 0) 



Bibliography 



1) M.R. Schroeder and B.S. Atal, "Code-Excited Linear Prediction (CELP): High quality speech at very low bit 
rates,'" Proc. ICASSP'85, pp. 937-940, 1985. 

2) Y. Tohkura and F. Itakura, "Spectral smoothing technique in PARCOR speech analysis-synthesis," IEEE Trans. 
onASSP, vol. 26, no. 6, pp. 587-596, Dec. 1978. 

3) L.R. Rabiner and R.W. Schaefer. Digital processing of speech signals. Prentice-Hall Int., 1978. 

4) F. Itakura, "Line spectral representation of linear predictive coefficients of speech signals," J. Acoust. Soc. Amer, 
vol. 57, Supplement no. 1, S35, 1975. 

5) F.K. Soong and B.H. Juang, "Line spectrum pair (LSP) and speech data compression", Proc. ICASSP'84, pp. 
1.10.1-1.10.4, 1984. 

6) P. Kabal and R.P. Ramachandran, "The computation of line spectral frequencies using Chebyshev polynomials", 
IEEE Trans, on ASSP, vol. 34, no. 6, pp. 1419-1426, Dec. 1986. 

7) C. Laflamme, J-P. Adoul, R. Salami, S. Morissette, and P. Mabilleau, "16 kpbs wideband speech coding 
technique based on algebraic CELP" Proc. ICASSP'91, pp. 13-16. 



£75/ 



3GPP TS 46.060 version 11.0.0 Release 11 



43 



ETSI TS 146 060 V1 1.0.0 (2012-10) 



Annex A (informative): 
Change history 



SMG 


SPEC 


CR 


PH 


VER 


NEW_VE 


SUBJECT 


s23 


06.60 A003 J 


2 


4.0.0 4.0.1 


Vote 115 comments 


S25 


06.60 


A005 


2 


4.0.1 


4.1.0 


Corrections to GSM 06.60 


S28 


06.60 






4.1.0 


6.0.0 


Release 1997 version 


s28 


06.60 


A007 




6.0.0 7.0.0 


Addition of mu-Law (PCS 1900) 




06.60 






7.0.1 7.0.2 


Update to Version 7.0.2 for Publication 


s31 


06.60 






7.0.2 8.0.0 


Release 1999 version 




06.60 






8.0.0 


8.0.1 


Update to Version 8.0.1 for Publication 



Change history | 


Date 


TSG# 


TSG Doc. 


CR 


Rev 


Subject/Comment 


Old 


New 


03-2001 


11 








Version for Release 4 




4.0.0 


06-2002 


16 








Version for Release 5 


4.0.0 


5.0.0 


1 2-2004 


26 








Version for Release 6 


5.0.0 


6.0.0 


06-2007 


36 








Version for Release 7 


6.0.0 


7.0.0 


1 2-2008 


42 








Version for Release 8 


7.0.0 


8.0.0 


1 2-2009 


46 








Version for Release 9 


8.0.0 


9.0.0 


03-2011 


51 








Version for Release 10 


9.0.0 


10.0.0 


09-2012 


57 








Version for Release 1 1 


10.0.0 


11.0.0 



£75/ 



3GPP TS 46.060 version 11.0.0 Release 11 



44 



ETSI TS 146 060 V1 1.0.0 (2012-10) 



History 



Document history 


Vll.0.0 


October 2012 


Publication 



























£75/ 



